{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dense Sentiment Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we build a dense neural net to classify IMDB movie reviews by their sentiment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/the-deep-learners/deep-learning-illustrated/blob/master/notebooks/dense_sentiment_classifier.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "import keras\n",
    "from keras.datasets import imdb # new! \n",
    "from keras.preprocessing.sequence import pad_sequences #new!\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Flatten, Dropout\n",
    "from keras.layers import Embedding # new!\n",
    "from keras.callbacks import ModelCheckpoint # new! \n",
    "import os # new! \n",
    "from sklearn.metrics import roc_auc_score, roc_curve # new!\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt # new!\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set hyperparameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# output directory name:\n",
    "output_dir = 'model_output/dense'\n",
    "\n",
    "# training:\n",
    "epochs = 4\n",
    "batch_size = 128\n",
    "\n",
    "# vector-space embedding: \n",
    "n_dim = 64\n",
    "n_unique_words = 5000 # as per Maas et al. (2011); may not be optimal\n",
    "n_words_to_skip = 50 # ditto\n",
    "max_review_length = 100\n",
    "pad_type = trunc_type = 'pre'\n",
    "\n",
    "# neural network architecture: \n",
    "n_dense = 64\n",
    "dropout = 0.5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For a given data set: \n",
    "\n",
    "* the Keras text utilities [here](https://keras.io/preprocessing/text/) quickly preprocess natural language and convert it into an index\n",
    "* the `keras.preprocessing.text.Tokenizer` class may do everything you need in one line:\n",
    "    * tokenize into words or characters\n",
    "    * `num_words`: maximum unique tokens\n",
    "    * filter out punctuation\n",
    "    * lower case\n",
    "    * convert words to an integer index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "(x_train, y_train), (x_valid, y_valid) = imdb.load_data(num_words=n_unique_words, \n",
    "                                                        skip_top=n_words_to_skip) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**N.B.**: If you're using Google Colab and the above line of code throws this error: \n",
    "\n",
    "[ValueError: Object arrays cannot be loaded when allow_pickle=False](https://stackoverflow.com/questions/55890813/how-to-fix-object-arrays-cannot-be-loaded-when-allow-pickle-false-for-imdb-loa)\n",
    "\n",
    "As of May 24th, 2019 you can resolve this error by executing `!pip install numpy==1.16.2` and restarting the runtime (by default, Colab uses a later version of NumPy -- 1.16.3 -- that causes the error)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([list([2, 2, 2, 2, 2, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 2, 173, 2, 256, 2, 2, 100, 2, 838, 112, 50, 670, 2, 2, 2, 480, 284, 2, 150, 2, 172, 112, 167, 2, 336, 385, 2, 2, 172, 4536, 1111, 2, 546, 2, 2, 447, 2, 192, 50, 2, 2, 147, 2025, 2, 2, 2, 2, 1920, 4613, 469, 2, 2, 71, 87, 2, 2, 2, 530, 2, 76, 2, 2, 1247, 2, 2, 2, 515, 2, 2, 2, 626, 2, 2, 2, 62, 386, 2, 2, 316, 2, 106, 2, 2, 2223, 2, 2, 480, 66, 3785, 2, 2, 130, 2, 2, 2, 619, 2, 2, 124, 51, 2, 135, 2, 2, 1415, 2, 2, 2, 2, 215, 2, 77, 52, 2, 2, 407, 2, 82, 2, 2, 2, 107, 117, 2, 2, 256, 2, 2, 2, 3766, 2, 723, 2, 71, 2, 530, 476, 2, 400, 317, 2, 2, 2, 2, 1029, 2, 104, 88, 2, 381, 2, 297, 98, 2, 2071, 56, 2, 141, 2, 194, 2, 2, 2, 226, 2, 2, 134, 476, 2, 480, 2, 144, 2, 2, 2, 51, 2, 2, 224, 92, 2, 104, 2, 226, 65, 2, 2, 1334, 88, 2, 2, 283, 2, 2, 4472, 113, 103, 2, 2, 2, 2, 2, 178, 2]),\n",
       "       list([2, 194, 1153, 194, 2, 78, 228, 2, 2, 1463, 4369, 2, 134, 2, 2, 715, 2, 118, 1634, 2, 394, 2, 2, 119, 954, 189, 102, 2, 207, 110, 3103, 2, 2, 69, 188, 2, 2, 2, 2, 2, 249, 126, 93, 2, 114, 2, 2300, 1523, 2, 647, 2, 116, 2, 2, 2, 2, 229, 2, 340, 1322, 2, 118, 2, 2, 130, 4901, 2, 2, 1002, 2, 89, 2, 952, 2, 2, 2, 455, 2, 2, 2, 2, 1543, 1905, 398, 2, 1649, 2, 2, 2, 163, 2, 3215, 2, 2, 1153, 2, 194, 775, 2, 2, 2, 349, 2637, 148, 605, 2, 2, 2, 123, 125, 68, 2, 2, 2, 349, 165, 4362, 98, 2, 2, 228, 2, 2, 2, 1157, 2, 299, 120, 2, 120, 174, 2, 220, 175, 136, 50, 2, 4373, 228, 2, 2, 2, 656, 245, 2350, 2, 2, 2, 131, 152, 491, 2, 2, 2, 2, 1212, 2, 2, 2, 371, 78, 2, 625, 64, 1382, 2, 2, 168, 145, 2, 2, 1690, 2, 2, 2, 1355, 2, 2, 2, 52, 154, 462, 2, 89, 78, 285, 2, 145, 95]),\n",
       "       list([2, 2, 2, 2, 2, 2, 2, 2, 249, 108, 2, 2, 2, 54, 61, 369, 2, 71, 149, 2, 2, 112, 2, 2401, 311, 2, 2, 3711, 2, 75, 2, 1829, 296, 2, 86, 320, 2, 534, 2, 263, 4821, 1301, 2, 1873, 2, 89, 78, 2, 66, 2, 2, 360, 2, 2, 58, 316, 334, 2, 2, 1716, 2, 645, 662, 2, 257, 85, 1200, 2, 1228, 2578, 83, 68, 3912, 2, 2, 165, 1539, 278, 2, 69, 2, 780, 2, 106, 2, 2, 1338, 2, 2, 2, 2, 215, 2, 610, 2, 2, 87, 326, 2, 2300, 2, 2, 2, 2, 272, 2, 57, 2, 2, 2, 2, 2, 2, 2307, 51, 2, 170, 2, 595, 116, 595, 1352, 2, 191, 79, 638, 89, 2, 2, 2, 2, 106, 607, 624, 2, 534, 2, 227, 2, 129, 113]),\n",
       "       list([2, 2, 2, 2, 2, 2804, 2, 2040, 432, 111, 153, 103, 2, 1494, 2, 70, 131, 67, 2, 61, 2, 744, 2, 3715, 761, 61, 2, 452, 2, 2, 985, 2, 2, 59, 166, 2, 105, 216, 1239, 2, 1797, 2, 2, 2, 2, 744, 2413, 2, 2, 2, 687, 2, 2, 2, 2, 2, 3693, 2, 2, 2, 121, 59, 456, 2, 2, 2, 265, 2, 575, 111, 153, 159, 59, 2, 1447, 2, 2, 586, 482, 2, 2, 96, 59, 716, 2, 2, 172, 65, 2, 579, 2, 2, 2, 1615, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 464, 2, 314, 2, 2, 2, 719, 605, 2, 2, 202, 2, 310, 2, 3772, 3501, 2, 2722, 58, 2, 2, 537, 2116, 180, 2, 2, 413, 173, 2, 263, 112, 2, 152, 377, 2, 537, 263, 846, 579, 178, 54, 75, 71, 476, 2, 413, 263, 2504, 182, 2, 2, 75, 2306, 922, 2, 279, 131, 2895, 2, 2867, 2, 2, 2, 921, 2, 192, 2, 1219, 3890, 2, 2, 217, 4122, 1710, 537, 2, 1236, 2, 736, 2, 2, 61, 403, 2, 2, 2, 61, 4494, 2, 2, 4494, 159, 90, 263, 2311, 4319, 309, 2, 178, 2, 82, 4319, 2, 65, 2, 2, 145, 143, 2, 2, 2, 537, 746, 537, 537, 2, 2, 2, 2, 594, 2, 2, 94, 2, 3987, 2, 2, 2, 2, 538, 2, 1795, 246, 2, 2, 2, 2, 635, 2, 2, 51, 408, 2, 94, 318, 1382, 2, 2, 2, 2683, 936, 2, 2, 2, 2, 2, 2, 2, 1885, 2, 1118, 2, 80, 126, 842, 2, 2, 2, 2, 4726, 2, 4494, 2, 1550, 3633, 159, 2, 341, 2, 2733, 2, 4185, 173, 2, 90, 2, 2, 2, 2, 2, 1784, 86, 1117, 2, 3261, 2, 2, 2, 2, 2, 2, 2841, 2, 2, 1010, 2, 793, 2, 2, 1386, 1830, 2, 2, 246, 50, 2, 2, 2750, 1944, 746, 90, 2, 2, 2, 124, 2, 882, 2, 882, 496, 2, 2, 2213, 537, 121, 127, 1219, 130, 2, 2, 494, 2, 124, 2, 882, 496, 2, 341, 2, 2, 846, 2, 2, 2, 2, 1906, 2, 97, 2, 236, 2, 1311, 2, 2, 2, 2, 2, 2, 2, 91, 2, 3987, 70, 2, 882, 2, 579, 2, 2, 2, 2, 2, 537, 2, 2, 2, 2, 65, 2, 537, 75, 2, 1775, 3353, 2, 1846, 2, 2, 2, 154, 2, 2, 518, 53, 2, 2, 2, 3211, 882, 2, 399, 2, 75, 257, 3807, 2, 2, 2, 2, 456, 2, 65, 2, 2, 205, 113, 2, 2, 2, 2, 2, 2, 2, 242, 2, 91, 1202, 2, 2, 2070, 307, 2, 2, 2, 126, 93, 2, 2, 2, 188, 1076, 3222, 2, 2, 2, 2, 2348, 537, 2, 53, 537, 2, 82, 2, 2, 2, 2, 2, 280, 2, 219, 2, 2, 431, 758, 859, 2, 953, 1052, 2, 2, 2, 2, 94, 2, 2, 238, 60, 2, 2, 2, 804, 2, 2, 2, 2, 132, 2, 67, 2, 2, 2, 2, 283, 2, 2, 2, 2, 2, 242, 955, 2, 2, 279, 2, 2, 2, 1685, 195, 2, 238, 60, 796, 2, 2, 671, 2, 2804, 2, 2, 559, 154, 888, 2, 726, 50, 2, 2, 2, 2, 566, 2, 579, 2, 64, 2574]),\n",
       "       list([2, 249, 1323, 2, 61, 113, 2, 2, 2, 1637, 2, 2, 56, 2, 2401, 2, 457, 88, 2, 2626, 1400, 2, 3171, 2, 70, 79, 2, 706, 919, 2, 2, 355, 340, 355, 1696, 96, 143, 2, 2, 2, 289, 2, 61, 369, 71, 2359, 2, 2, 2, 131, 2073, 249, 114, 249, 229, 249, 2, 2, 2, 126, 110, 2, 473, 2, 569, 61, 419, 56, 429, 2, 1513, 2, 2, 534, 95, 474, 570, 2, 2, 124, 138, 88, 2, 421, 1543, 52, 725, 2, 61, 419, 2, 2, 1571, 2, 1543, 2, 2, 2, 2, 2, 296, 2, 3524, 2, 2, 421, 128, 74, 233, 334, 207, 126, 224, 2, 562, 298, 2167, 1272, 2, 2601, 2, 516, 988, 2, 2, 79, 120, 2, 595, 2, 784, 2, 3171, 2, 165, 170, 143, 2, 2, 2, 2, 2, 226, 251, 2, 61, 113]),\n",
       "       list([2, 778, 128, 74, 2, 630, 163, 2, 2, 1766, 2, 1051, 2, 2, 85, 156, 2, 2, 148, 139, 121, 664, 665, 2, 2, 1361, 173, 2, 749, 2, 2, 3804, 2, 2, 226, 65, 2, 2, 127, 2, 2, 2, 2])],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_train[0:6] # 0 reserved for padding; 1 would be starting character; 2 is unknown; 3 is most common word, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "218\n",
      "189\n",
      "141\n",
      "550\n",
      "147\n",
      "43\n"
     ]
    }
   ],
   "source": [
    "for x in x_train[0:6]:\n",
    "    print(len(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 0, 0, 1, 0, 0])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train[0:6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(25000, 25000)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(x_train), len(x_valid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Restoring words from index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "word_index = keras.datasets.imdb.get_word_index()\n",
    "word_index = {k:(v+3) for k,v in word_index.items()}\n",
    "word_index[\"PAD\"] = 0\n",
    "word_index[\"START\"] = 1\n",
    "word_index[\"UNK\"] = 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'fawn': 34704,\n",
       " 'tsukino': 52009,\n",
       " 'nunnery': 52010,\n",
       " 'sonja': 16819,\n",
       " 'vani': 63954,\n",
       " 'woods': 1411,\n",
       " 'spiders': 16118,\n",
       " 'hanging': 2348,\n",
       " 'woody': 2292,\n",
       " 'trawling': 52011,\n",
       " \"hold's\": 52012,\n",
       " 'comically': 11310,\n",
       " 'localized': 40833,\n",
       " 'disobeying': 30571,\n",
       " \"'royale\": 52013,\n",
       " \"harpo's\": 40834,\n",
       " 'canet': 52014,\n",
       " 'aileen': 19316,\n",
       " 'acurately': 52015,\n",
       " \"diplomat's\": 52016,\n",
       " 'rickman': 25245,\n",
       " 'arranged': 6749,\n",
       " 'rumbustious': 52017,\n",
       " 'familiarness': 52018,\n",
       " \"spider'\": 52019,\n",
       " 'hahahah': 68807,\n",
       " \"wood'\": 52020,\n",
       " 'transvestism': 40836,\n",
       " \"hangin'\": 34705,\n",
       " 'bringing': 2341,\n",
       " 'seamier': 40837,\n",
       " 'wooded': 34706,\n",
       " 'bravora': 52021,\n",
       " 'grueling': 16820,\n",
       " 'wooden': 1639,\n",
       " 'wednesday': 16821,\n",
       " \"'prix\": 52022,\n",
       " 'altagracia': 34707,\n",
       " 'circuitry': 52023,\n",
       " 'crotch': 11588,\n",
       " 'busybody': 57769,\n",
       " \"tart'n'tangy\": 52024,\n",
       " 'burgade': 14132,\n",
       " 'thrace': 52026,\n",
       " \"tom's\": 11041,\n",
       " 'snuggles': 52028,\n",
       " 'francesco': 29117,\n",
       " 'complainers': 52030,\n",
       " 'templarios': 52128,\n",
       " '272': 40838,\n",
       " '273': 52031,\n",
       " 'zaniacs': 52133,\n",
       " '275': 34709,\n",
       " 'consenting': 27634,\n",
       " 'snuggled': 40839,\n",
       " 'inanimate': 15495,\n",
       " 'uality': 52033,\n",
       " 'bronte': 11929,\n",
       " 'errors': 4013,\n",
       " 'dialogs': 3233,\n",
       " \"yomada's\": 52034,\n",
       " \"madman's\": 34710,\n",
       " 'dialoge': 30588,\n",
       " 'usenet': 52036,\n",
       " 'videodrome': 40840,\n",
       " \"kid'\": 26341,\n",
       " 'pawed': 52037,\n",
       " \"'girlfriend'\": 30572,\n",
       " \"'pleasure\": 52038,\n",
       " \"'reloaded'\": 52039,\n",
       " \"kazakos'\": 40842,\n",
       " 'rocque': 52040,\n",
       " 'mailings': 52041,\n",
       " 'brainwashed': 11930,\n",
       " 'mcanally': 16822,\n",
       " \"tom''\": 52042,\n",
       " 'kurupt': 25246,\n",
       " 'affiliated': 21908,\n",
       " 'babaganoosh': 52043,\n",
       " \"noe's\": 40843,\n",
       " 'quart': 40844,\n",
       " 'kids': 362,\n",
       " 'uplifting': 5037,\n",
       " 'controversy': 7096,\n",
       " 'kida': 21909,\n",
       " 'kidd': 23382,\n",
       " \"error'\": 52044,\n",
       " 'neurologist': 52045,\n",
       " 'spotty': 18513,\n",
       " 'cobblers': 30573,\n",
       " 'projection': 9881,\n",
       " 'fastforwarding': 40845,\n",
       " 'sters': 52046,\n",
       " \"eggar's\": 52047,\n",
       " 'etherything': 52048,\n",
       " 'gateshead': 40846,\n",
       " 'airball': 34711,\n",
       " 'unsinkable': 25247,\n",
       " 'stern': 7183,\n",
       " \"cervi's\": 52049,\n",
       " 'dnd': 40847,\n",
       " 'dna': 11589,\n",
       " 'insecurity': 20601,\n",
       " \"'reboot'\": 52050,\n",
       " 'trelkovsky': 11040,\n",
       " 'jaekel': 52051,\n",
       " 'sidebars': 52052,\n",
       " \"sforza's\": 52053,\n",
       " 'distortions': 17636,\n",
       " 'mutinies': 52054,\n",
       " 'sermons': 30605,\n",
       " '7ft': 40849,\n",
       " 'boobage': 52055,\n",
       " \"o'bannon's\": 52056,\n",
       " 'populations': 23383,\n",
       " 'chulak': 52057,\n",
       " 'mesmerize': 27636,\n",
       " 'quinnell': 52058,\n",
       " 'yahoo': 10310,\n",
       " 'meteorologist': 52060,\n",
       " 'beswick': 42580,\n",
       " 'boorman': 15496,\n",
       " 'voicework': 40850,\n",
       " \"ster'\": 52061,\n",
       " 'blustering': 22925,\n",
       " 'hj': 52062,\n",
       " 'intake': 27637,\n",
       " 'morally': 5624,\n",
       " 'jumbling': 40852,\n",
       " 'bowersock': 52063,\n",
       " \"'porky's'\": 52064,\n",
       " 'gershon': 16824,\n",
       " 'ludicrosity': 40853,\n",
       " 'coprophilia': 52065,\n",
       " 'expressively': 40854,\n",
       " \"india's\": 19503,\n",
       " \"post's\": 34713,\n",
       " 'wana': 52066,\n",
       " 'wang': 5286,\n",
       " 'wand': 30574,\n",
       " 'wane': 25248,\n",
       " 'edgeways': 52324,\n",
       " 'titanium': 34714,\n",
       " 'pinta': 40855,\n",
       " 'want': 181,\n",
       " 'pinto': 30575,\n",
       " 'whoopdedoodles': 52068,\n",
       " 'tchaikovsky': 21911,\n",
       " 'travel': 2106,\n",
       " \"'victory'\": 52069,\n",
       " 'copious': 11931,\n",
       " 'gouge': 22436,\n",
       " \"chapters'\": 52070,\n",
       " 'barbra': 6705,\n",
       " 'uselessness': 30576,\n",
       " \"wan'\": 52071,\n",
       " 'assimilated': 27638,\n",
       " 'petiot': 16119,\n",
       " 'most\\x85and': 52072,\n",
       " 'dinosaurs': 3933,\n",
       " 'wrong': 355,\n",
       " 'seda': 52073,\n",
       " 'stollen': 52074,\n",
       " 'sentencing': 34715,\n",
       " 'ouroboros': 40856,\n",
       " 'assimilates': 40857,\n",
       " 'colorfully': 40858,\n",
       " 'glenne': 27639,\n",
       " 'dongen': 52075,\n",
       " 'subplots': 4763,\n",
       " 'kiloton': 52076,\n",
       " 'chandon': 23384,\n",
       " \"effect'\": 34716,\n",
       " 'snugly': 27640,\n",
       " 'kuei': 40859,\n",
       " 'welcomed': 9095,\n",
       " 'dishonor': 30074,\n",
       " 'concurrence': 52078,\n",
       " 'stoicism': 23385,\n",
       " \"guys'\": 14899,\n",
       " \"beroemd'\": 52080,\n",
       " 'butcher': 6706,\n",
       " \"melfi's\": 40860,\n",
       " 'aargh': 30626,\n",
       " 'playhouse': 20602,\n",
       " 'wickedly': 11311,\n",
       " 'fit': 1183,\n",
       " 'labratory': 52081,\n",
       " 'lifeline': 40862,\n",
       " 'screaming': 1930,\n",
       " 'fix': 4290,\n",
       " 'cineliterate': 52082,\n",
       " 'fic': 52083,\n",
       " 'fia': 52084,\n",
       " 'fig': 34717,\n",
       " 'fmvs': 52085,\n",
       " 'fie': 52086,\n",
       " 'reentered': 52087,\n",
       " 'fin': 30577,\n",
       " 'doctresses': 52088,\n",
       " 'fil': 52089,\n",
       " 'zucker': 12609,\n",
       " 'ached': 31934,\n",
       " 'counsil': 52091,\n",
       " 'paterfamilias': 52092,\n",
       " 'songwriter': 13888,\n",
       " 'shivam': 34718,\n",
       " 'hurting': 9657,\n",
       " 'effects': 302,\n",
       " 'slauther': 52093,\n",
       " \"'flame'\": 52094,\n",
       " 'sommerset': 52095,\n",
       " 'interwhined': 52096,\n",
       " 'whacking': 27641,\n",
       " 'bartok': 52097,\n",
       " 'barton': 8778,\n",
       " 'frewer': 21912,\n",
       " \"fi'\": 52098,\n",
       " 'ingrid': 6195,\n",
       " 'stribor': 30578,\n",
       " 'approporiately': 52099,\n",
       " 'wobblyhand': 52100,\n",
       " 'tantalisingly': 52101,\n",
       " 'ankylosaurus': 52102,\n",
       " 'parasites': 17637,\n",
       " 'childen': 52103,\n",
       " \"jenkins'\": 52104,\n",
       " 'metafiction': 52105,\n",
       " 'golem': 17638,\n",
       " 'indiscretion': 40863,\n",
       " \"reeves'\": 23386,\n",
       " \"inamorata's\": 57784,\n",
       " 'brittannica': 52107,\n",
       " 'adapt': 7919,\n",
       " \"russo's\": 30579,\n",
       " 'guitarists': 48249,\n",
       " 'abbott': 10556,\n",
       " 'abbots': 40864,\n",
       " 'lanisha': 17652,\n",
       " 'magickal': 40866,\n",
       " 'mattter': 52108,\n",
       " \"'willy\": 52109,\n",
       " 'pumpkins': 34719,\n",
       " 'stuntpeople': 52110,\n",
       " 'estimate': 30580,\n",
       " 'ugghhh': 40867,\n",
       " 'gameplay': 11312,\n",
       " \"wern't\": 52111,\n",
       " \"n'sync\": 40868,\n",
       " 'sickeningly': 16120,\n",
       " 'chiara': 40869,\n",
       " 'disturbed': 4014,\n",
       " 'portmanteau': 40870,\n",
       " 'ineffectively': 52112,\n",
       " \"duchonvey's\": 82146,\n",
       " \"nasty'\": 37522,\n",
       " 'purpose': 1288,\n",
       " 'lazers': 52115,\n",
       " 'lightened': 28108,\n",
       " 'kaliganj': 52116,\n",
       " 'popularism': 52117,\n",
       " \"damme's\": 18514,\n",
       " 'stylistics': 30581,\n",
       " 'mindgaming': 52118,\n",
       " 'spoilerish': 46452,\n",
       " \"'corny'\": 52120,\n",
       " 'boerner': 34721,\n",
       " 'olds': 6795,\n",
       " 'bakelite': 52121,\n",
       " 'renovated': 27642,\n",
       " 'forrester': 27643,\n",
       " \"lumiere's\": 52122,\n",
       " 'gaskets': 52027,\n",
       " 'needed': 887,\n",
       " 'smight': 34722,\n",
       " 'master': 1300,\n",
       " \"edie's\": 25908,\n",
       " 'seeber': 40871,\n",
       " 'hiya': 52123,\n",
       " 'fuzziness': 52124,\n",
       " 'genesis': 14900,\n",
       " 'rewards': 12610,\n",
       " 'enthrall': 30582,\n",
       " \"'about\": 40872,\n",
       " \"recollection's\": 52125,\n",
       " 'mutilated': 11042,\n",
       " 'fatherlands': 52126,\n",
       " \"fischer's\": 52127,\n",
       " 'positively': 5402,\n",
       " '270': 34708,\n",
       " 'ahmed': 34723,\n",
       " 'zatoichi': 9839,\n",
       " 'bannister': 13889,\n",
       " 'anniversaries': 52130,\n",
       " \"helm's\": 30583,\n",
       " \"'work'\": 52131,\n",
       " 'exclaimed': 34724,\n",
       " \"'unfunny'\": 52132,\n",
       " '274': 52032,\n",
       " 'feeling': 547,\n",
       " \"wanda's\": 52134,\n",
       " 'dolan': 33269,\n",
       " '278': 52136,\n",
       " 'peacoat': 52137,\n",
       " 'brawny': 40873,\n",
       " 'mishra': 40874,\n",
       " 'worlders': 40875,\n",
       " 'protags': 52138,\n",
       " 'skullcap': 52139,\n",
       " 'dastagir': 57599,\n",
       " 'affairs': 5625,\n",
       " 'wholesome': 7802,\n",
       " 'hymen': 52140,\n",
       " 'paramedics': 25249,\n",
       " 'unpersons': 52141,\n",
       " 'heavyarms': 52142,\n",
       " 'affaire': 52143,\n",
       " 'coulisses': 52144,\n",
       " 'hymer': 40876,\n",
       " 'kremlin': 52145,\n",
       " 'shipments': 30584,\n",
       " 'pixilated': 52146,\n",
       " \"'00s\": 30585,\n",
       " 'diminishing': 18515,\n",
       " 'cinematic': 1360,\n",
       " 'resonates': 14901,\n",
       " 'simplify': 40877,\n",
       " \"nature'\": 40878,\n",
       " 'temptresses': 40879,\n",
       " 'reverence': 16825,\n",
       " 'resonated': 19505,\n",
       " 'dailey': 34725,\n",
       " '2\\x85': 52147,\n",
       " 'treize': 27644,\n",
       " 'majo': 52148,\n",
       " 'kiya': 21913,\n",
       " 'woolnough': 52149,\n",
       " 'thanatos': 39800,\n",
       " 'sandoval': 35734,\n",
       " 'dorama': 40882,\n",
       " \"o'shaughnessy\": 52150,\n",
       " 'tech': 4991,\n",
       " 'fugitives': 32021,\n",
       " 'teck': 30586,\n",
       " \"'e'\": 76128,\n",
       " 'doesn’t': 40884,\n",
       " 'purged': 52152,\n",
       " 'saying': 660,\n",
       " \"martians'\": 41098,\n",
       " 'norliss': 23421,\n",
       " 'dickey': 27645,\n",
       " 'dicker': 52155,\n",
       " \"'sependipity\": 52156,\n",
       " 'padded': 8425,\n",
       " 'ordell': 57795,\n",
       " \"sturges'\": 40885,\n",
       " 'independentcritics': 52157,\n",
       " 'tempted': 5748,\n",
       " \"atkinson's\": 34727,\n",
       " 'hounded': 25250,\n",
       " 'apace': 52158,\n",
       " 'clicked': 15497,\n",
       " \"'humor'\": 30587,\n",
       " \"martino's\": 17180,\n",
       " \"'supporting\": 52159,\n",
       " 'warmongering': 52035,\n",
       " \"zemeckis's\": 34728,\n",
       " 'lube': 21914,\n",
       " 'shocky': 52160,\n",
       " 'plate': 7479,\n",
       " 'plata': 40886,\n",
       " 'sturgess': 40887,\n",
       " \"nerds'\": 40888,\n",
       " 'plato': 20603,\n",
       " 'plath': 34729,\n",
       " 'platt': 40889,\n",
       " 'mcnab': 52162,\n",
       " 'clumsiness': 27646,\n",
       " 'altogether': 3902,\n",
       " 'massacring': 42587,\n",
       " 'bicenntinial': 52163,\n",
       " 'skaal': 40890,\n",
       " 'droning': 14363,\n",
       " 'lds': 8779,\n",
       " 'jaguar': 21915,\n",
       " \"cale's\": 34730,\n",
       " 'nicely': 1780,\n",
       " 'mummy': 4591,\n",
       " \"lot's\": 18516,\n",
       " 'patch': 10089,\n",
       " 'kerkhof': 50205,\n",
       " \"leader's\": 52164,\n",
       " \"'movie\": 27647,\n",
       " 'uncomfirmed': 52165,\n",
       " 'heirloom': 40891,\n",
       " 'wrangle': 47363,\n",
       " 'emotion\\x85': 52166,\n",
       " \"'stargate'\": 52167,\n",
       " 'pinoy': 40892,\n",
       " 'conchatta': 40893,\n",
       " 'broeke': 41131,\n",
       " 'advisedly': 40894,\n",
       " \"barker's\": 17639,\n",
       " 'descours': 52169,\n",
       " 'lots': 775,\n",
       " 'lotr': 9262,\n",
       " 'irs': 9882,\n",
       " 'lott': 52170,\n",
       " 'xvi': 40895,\n",
       " 'irk': 34731,\n",
       " 'irl': 52171,\n",
       " 'ira': 6890,\n",
       " 'belzer': 21916,\n",
       " 'irc': 52172,\n",
       " 'ire': 27648,\n",
       " 'requisites': 40896,\n",
       " 'discipline': 7696,\n",
       " 'lyoko': 52964,\n",
       " 'extend': 11313,\n",
       " 'nature': 876,\n",
       " \"'dickie'\": 52173,\n",
       " 'optimist': 40897,\n",
       " 'lapping': 30589,\n",
       " 'superficial': 3903,\n",
       " 'vestment': 52174,\n",
       " 'extent': 2826,\n",
       " 'tendons': 52175,\n",
       " \"heller's\": 52176,\n",
       " 'quagmires': 52177,\n",
       " 'miyako': 52178,\n",
       " 'moocow': 20604,\n",
       " \"coles'\": 52179,\n",
       " 'lookit': 40898,\n",
       " 'ravenously': 52180,\n",
       " 'levitating': 40899,\n",
       " 'perfunctorily': 52181,\n",
       " 'lookin': 30590,\n",
       " \"lot'\": 40901,\n",
       " 'lookie': 52182,\n",
       " 'fearlessly': 34873,\n",
       " 'libyan': 52184,\n",
       " 'fondles': 40902,\n",
       " 'gopher': 35717,\n",
       " 'wearying': 40904,\n",
       " \"nz's\": 52185,\n",
       " 'minuses': 27649,\n",
       " 'puposelessly': 52186,\n",
       " 'shandling': 52187,\n",
       " 'decapitates': 31271,\n",
       " 'humming': 11932,\n",
       " \"'nother\": 40905,\n",
       " 'smackdown': 21917,\n",
       " 'underdone': 30591,\n",
       " 'frf': 40906,\n",
       " 'triviality': 52188,\n",
       " 'fro': 25251,\n",
       " 'bothers': 8780,\n",
       " \"'kensington\": 52189,\n",
       " 'much': 76,\n",
       " 'muco': 34733,\n",
       " 'wiseguy': 22618,\n",
       " \"richie's\": 27651,\n",
       " 'tonino': 40907,\n",
       " 'unleavened': 52190,\n",
       " 'fry': 11590,\n",
       " \"'tv'\": 40908,\n",
       " 'toning': 40909,\n",
       " 'obese': 14364,\n",
       " 'sensationalized': 30592,\n",
       " 'spiv': 40910,\n",
       " 'spit': 6262,\n",
       " 'arkin': 7367,\n",
       " 'charleton': 21918,\n",
       " 'jeon': 16826,\n",
       " 'boardroom': 21919,\n",
       " 'doubts': 4992,\n",
       " 'spin': 3087,\n",
       " 'hepo': 53086,\n",
       " 'wildcat': 27652,\n",
       " 'venoms': 10587,\n",
       " 'misconstrues': 52194,\n",
       " 'mesmerising': 18517,\n",
       " 'misconstrued': 40911,\n",
       " 'rescinds': 52195,\n",
       " 'prostrate': 52196,\n",
       " 'majid': 40912,\n",
       " 'climbed': 16482,\n",
       " 'canoeing': 34734,\n",
       " 'majin': 52198,\n",
       " 'animie': 57807,\n",
       " 'sylke': 40913,\n",
       " 'conditioned': 14902,\n",
       " 'waddell': 40914,\n",
       " '3\\x85': 52199,\n",
       " 'hyperdrive': 41191,\n",
       " 'conditioner': 34735,\n",
       " 'bricklayer': 53156,\n",
       " 'hong': 2579,\n",
       " 'memoriam': 52201,\n",
       " 'inventively': 30595,\n",
       " \"levant's\": 25252,\n",
       " 'portobello': 20641,\n",
       " 'remand': 52203,\n",
       " 'mummified': 19507,\n",
       " 'honk': 27653,\n",
       " 'spews': 19508,\n",
       " 'visitations': 40915,\n",
       " 'mummifies': 52204,\n",
       " 'cavanaugh': 25253,\n",
       " 'zeon': 23388,\n",
       " \"jungle's\": 40916,\n",
       " 'viertel': 34736,\n",
       " 'frenchmen': 27654,\n",
       " 'torpedoes': 52205,\n",
       " 'schlessinger': 52206,\n",
       " 'torpedoed': 34737,\n",
       " 'blister': 69879,\n",
       " 'cinefest': 52207,\n",
       " 'furlough': 34738,\n",
       " 'mainsequence': 52208,\n",
       " 'mentors': 40917,\n",
       " 'academic': 9097,\n",
       " 'stillness': 20605,\n",
       " 'academia': 40918,\n",
       " 'lonelier': 52209,\n",
       " 'nibby': 52210,\n",
       " \"losers'\": 52211,\n",
       " 'cineastes': 40919,\n",
       " 'corporate': 4452,\n",
       " 'massaging': 40920,\n",
       " 'bellow': 30596,\n",
       " 'absurdities': 19509,\n",
       " 'expetations': 53244,\n",
       " 'nyfiken': 40921,\n",
       " 'mehras': 75641,\n",
       " 'lasse': 52212,\n",
       " 'visability': 52213,\n",
       " 'militarily': 33949,\n",
       " \"elder'\": 52214,\n",
       " 'gainsbourg': 19026,\n",
       " 'hah': 20606,\n",
       " 'hai': 13423,\n",
       " 'haj': 34739,\n",
       " 'hak': 25254,\n",
       " 'hal': 4314,\n",
       " 'ham': 4895,\n",
       " 'duffer': 53262,\n",
       " 'haa': 52216,\n",
       " 'had': 69,\n",
       " 'advancement': 11933,\n",
       " 'hag': 16828,\n",
       " \"hand'\": 25255,\n",
       " 'hay': 13424,\n",
       " 'mcnamara': 20607,\n",
       " \"mozart's\": 52217,\n",
       " 'duffel': 30734,\n",
       " 'haq': 30597,\n",
       " 'har': 13890,\n",
       " 'has': 47,\n",
       " 'hat': 2404,\n",
       " 'hav': 40922,\n",
       " 'haw': 30598,\n",
       " 'figtings': 52218,\n",
       " 'elders': 15498,\n",
       " 'underpanted': 52219,\n",
       " 'pninson': 52220,\n",
       " 'unequivocally': 27655,\n",
       " \"barbara's\": 23676,\n",
       " \"bello'\": 52222,\n",
       " 'indicative': 13000,\n",
       " 'yawnfest': 40923,\n",
       " 'hexploitation': 52223,\n",
       " \"loder's\": 52224,\n",
       " 'sleuthing': 27656,\n",
       " \"justin's\": 32625,\n",
       " \"'ball\": 52225,\n",
       " \"'summer\": 52226,\n",
       " \"'demons'\": 34938,\n",
       " \"mormon's\": 52228,\n",
       " \"laughton's\": 34740,\n",
       " 'debell': 52229,\n",
       " 'shipyard': 39727,\n",
       " 'unabashedly': 30600,\n",
       " 'disks': 40404,\n",
       " 'crowd': 2293,\n",
       " 'crowe': 10090,\n",
       " \"vancouver's\": 56437,\n",
       " 'mosques': 34741,\n",
       " 'crown': 6630,\n",
       " 'culpas': 52230,\n",
       " 'crows': 27657,\n",
       " 'surrell': 53347,\n",
       " 'flowless': 52232,\n",
       " 'sheirk': 52233,\n",
       " \"'three\": 40926,\n",
       " \"peterson'\": 52234,\n",
       " 'ooverall': 52235,\n",
       " 'perchance': 40927,\n",
       " 'bottom': 1324,\n",
       " 'chabert': 53366,\n",
       " 'sneha': 52236,\n",
       " 'inhuman': 13891,\n",
       " 'ichii': 52237,\n",
       " 'ursla': 52238,\n",
       " 'completly': 30601,\n",
       " 'moviedom': 40928,\n",
       " 'raddick': 52239,\n",
       " 'brundage': 51998,\n",
       " 'brigades': 40929,\n",
       " 'starring': 1184,\n",
       " \"'goal'\": 52240,\n",
       " 'caskets': 52241,\n",
       " 'willcock': 52242,\n",
       " \"threesome's\": 52243,\n",
       " \"mosque'\": 52244,\n",
       " \"cover's\": 52245,\n",
       " 'spaceships': 17640,\n",
       " 'anomalous': 40930,\n",
       " 'ptsd': 27658,\n",
       " 'shirdan': 52246,\n",
       " 'obscenity': 21965,\n",
       " 'lemmings': 30602,\n",
       " 'duccio': 30603,\n",
       " \"levene's\": 52247,\n",
       " \"'gorby'\": 52248,\n",
       " \"teenager's\": 25258,\n",
       " 'marshall': 5343,\n",
       " 'honeymoon': 9098,\n",
       " 'shoots': 3234,\n",
       " 'despised': 12261,\n",
       " 'okabasho': 52249,\n",
       " 'fabric': 8292,\n",
       " 'cannavale': 18518,\n",
       " 'raped': 3540,\n",
       " \"tutt's\": 52250,\n",
       " 'grasping': 17641,\n",
       " 'despises': 18519,\n",
       " \"thief's\": 40931,\n",
       " 'rapes': 8929,\n",
       " 'raper': 52251,\n",
       " \"eyre'\": 27659,\n",
       " 'walchek': 52252,\n",
       " \"elmo's\": 23389,\n",
       " 'perfumes': 40932,\n",
       " 'spurting': 21921,\n",
       " \"exposition'\\x85\": 52253,\n",
       " 'denoting': 52254,\n",
       " 'thesaurus': 34743,\n",
       " \"shoot'\": 40933,\n",
       " 'bonejack': 49762,\n",
       " 'simpsonian': 52256,\n",
       " 'hebetude': 30604,\n",
       " \"hallow's\": 34744,\n",
       " 'desperation\\x85': 52257,\n",
       " 'incinerator': 34745,\n",
       " 'congratulations': 10311,\n",
       " 'humbled': 52258,\n",
       " \"else's\": 5927,\n",
       " 'trelkovski': 40848,\n",
       " \"rape'\": 52259,\n",
       " \"'chapters'\": 59389,\n",
       " '1600s': 52260,\n",
       " 'martian': 7256,\n",
       " 'nicest': 25259,\n",
       " 'eyred': 52262,\n",
       " 'passenger': 9460,\n",
       " 'disgrace': 6044,\n",
       " 'moderne': 52263,\n",
       " 'barrymore': 5123,\n",
       " 'yankovich': 52264,\n",
       " 'moderns': 40934,\n",
       " 'studliest': 52265,\n",
       " 'bedsheet': 52266,\n",
       " 'decapitation': 14903,\n",
       " 'slurring': 52267,\n",
       " \"'nunsploitation'\": 52268,\n",
       " \"'character'\": 34746,\n",
       " 'cambodia': 9883,\n",
       " 'rebelious': 52269,\n",
       " 'pasadena': 27660,\n",
       " 'crowne': 40935,\n",
       " \"'bedchamber\": 52270,\n",
       " 'conjectural': 52271,\n",
       " 'appologize': 52272,\n",
       " 'halfassing': 52273,\n",
       " 'paycheque': 57819,\n",
       " 'palms': 20609,\n",
       " \"'islands\": 52274,\n",
       " 'hawked': 40936,\n",
       " 'palme': 21922,\n",
       " 'conservatively': 40937,\n",
       " 'larp': 64010,\n",
       " 'palma': 5561,\n",
       " 'smelling': 21923,\n",
       " 'aragorn': 13001,\n",
       " 'hawker': 52275,\n",
       " 'hawkes': 52276,\n",
       " 'explosions': 3978,\n",
       " 'loren': 8062,\n",
       " \"pyle's\": 52277,\n",
       " 'shootout': 6707,\n",
       " \"mike's\": 18520,\n",
       " \"driscoll's\": 52278,\n",
       " 'cogsworth': 40938,\n",
       " \"britian's\": 52279,\n",
       " 'childs': 34747,\n",
       " \"portrait's\": 52280,\n",
       " 'chain': 3629,\n",
       " 'whoever': 2500,\n",
       " 'puttered': 52281,\n",
       " 'childe': 52282,\n",
       " 'maywether': 52283,\n",
       " 'chair': 3039,\n",
       " \"rance's\": 52284,\n",
       " 'machu': 34748,\n",
       " 'ballet': 4520,\n",
       " 'grapples': 34749,\n",
       " 'summerize': 76155,\n",
       " 'freelance': 30606,\n",
       " \"andrea's\": 52286,\n",
       " '\\x91very': 52287,\n",
       " 'coolidge': 45882,\n",
       " 'mache': 18521,\n",
       " 'balled': 52288,\n",
       " 'grappled': 40940,\n",
       " 'macha': 18522,\n",
       " 'underlining': 21924,\n",
       " 'macho': 5626,\n",
       " 'oversight': 19510,\n",
       " 'machi': 25260,\n",
       " 'verbally': 11314,\n",
       " 'tenacious': 21925,\n",
       " 'windshields': 40941,\n",
       " 'paychecks': 18560,\n",
       " 'jerk': 3399,\n",
       " \"good'\": 11934,\n",
       " 'prancer': 34751,\n",
       " 'prances': 21926,\n",
       " 'olympus': 52289,\n",
       " 'lark': 21927,\n",
       " 'embark': 10788,\n",
       " 'gloomy': 7368,\n",
       " 'jehaan': 52290,\n",
       " 'turaqui': 52291,\n",
       " \"child'\": 20610,\n",
       " 'locked': 2897,\n",
       " 'pranced': 52292,\n",
       " 'exact': 2591,\n",
       " 'unattuned': 52293,\n",
       " 'minute': 786,\n",
       " 'skewed': 16121,\n",
       " 'hodgins': 40943,\n",
       " 'skewer': 34752,\n",
       " 'think\\x85': 52294,\n",
       " 'rosenstein': 38768,\n",
       " 'helmit': 52295,\n",
       " 'wrestlemanias': 34753,\n",
       " 'hindered': 16829,\n",
       " \"martha's\": 30607,\n",
       " 'cheree': 52296,\n",
       " \"pluckin'\": 52297,\n",
       " 'ogles': 40944,\n",
       " 'heavyweight': 11935,\n",
       " 'aada': 82193,\n",
       " 'chopping': 11315,\n",
       " 'strongboy': 61537,\n",
       " 'hegemonic': 41345,\n",
       " 'adorns': 40945,\n",
       " 'xxth': 41349,\n",
       " 'nobuhiro': 34754,\n",
       " 'capitães': 52301,\n",
       " 'kavogianni': 52302,\n",
       " 'antwerp': 13425,\n",
       " 'celebrated': 6541,\n",
       " 'roarke': 52303,\n",
       " 'baggins': 40946,\n",
       " 'cheeseburgers': 31273,\n",
       " 'matras': 52304,\n",
       " \"nineties'\": 52305,\n",
       " \"'craig'\": 52306,\n",
       " 'celebrates': 13002,\n",
       " 'unintentionally': 3386,\n",
       " 'drafted': 14365,\n",
       " 'climby': 52307,\n",
       " '303': 52308,\n",
       " 'oldies': 18523,\n",
       " 'climbs': 9099,\n",
       " 'honour': 9658,\n",
       " 'plucking': 34755,\n",
       " '305': 30077,\n",
       " 'address': 5517,\n",
       " 'menjou': 40947,\n",
       " \"'freak'\": 42595,\n",
       " 'dwindling': 19511,\n",
       " 'benson': 9461,\n",
       " 'white’s': 52310,\n",
       " 'shamelessness': 40948,\n",
       " 'impacted': 21928,\n",
       " 'upatz': 52311,\n",
       " 'cusack': 3843,\n",
       " \"flavia's\": 37570,\n",
       " 'effette': 52312,\n",
       " 'influx': 34756,\n",
       " 'boooooooo': 52313,\n",
       " 'dimitrova': 52314,\n",
       " 'houseman': 13426,\n",
       " 'bigas': 25262,\n",
       " 'boylen': 52315,\n",
       " 'phillipenes': 52316,\n",
       " 'fakery': 40949,\n",
       " \"grandpa's\": 27661,\n",
       " 'darnell': 27662,\n",
       " 'undergone': 19512,\n",
       " 'handbags': 52318,\n",
       " 'perished': 21929,\n",
       " 'pooped': 37781,\n",
       " 'vigour': 27663,\n",
       " 'opposed': 3630,\n",
       " 'etude': 52319,\n",
       " \"caine's\": 11802,\n",
       " 'doozers': 52320,\n",
       " 'photojournals': 34757,\n",
       " 'perishes': 52321,\n",
       " 'constrains': 34758,\n",
       " 'migenes': 40951,\n",
       " 'consoled': 30608,\n",
       " 'alastair': 16830,\n",
       " 'wvs': 52322,\n",
       " 'ooooooh': 52323,\n",
       " 'approving': 34759,\n",
       " 'consoles': 40952,\n",
       " 'disparagement': 52067,\n",
       " 'futureistic': 52325,\n",
       " 'rebounding': 52326,\n",
       " \"'date\": 52327,\n",
       " 'gregoire': 52328,\n",
       " 'rutherford': 21930,\n",
       " 'americanised': 34760,\n",
       " 'novikov': 82199,\n",
       " 'following': 1045,\n",
       " 'munroe': 34761,\n",
       " \"morita'\": 52329,\n",
       " 'christenssen': 52330,\n",
       " 'oatmeal': 23109,\n",
       " 'fossey': 25263,\n",
       " 'livered': 40953,\n",
       " 'listens': 13003,\n",
       " \"'marci\": 76167,\n",
       " \"otis's\": 52333,\n",
       " 'thanking': 23390,\n",
       " 'maude': 16022,\n",
       " 'extensions': 34762,\n",
       " 'ameteurish': 52335,\n",
       " \"commender's\": 52336,\n",
       " 'agricultural': 27664,\n",
       " 'convincingly': 4521,\n",
       " 'fueled': 17642,\n",
       " 'mahattan': 54017,\n",
       " \"paris's\": 40955,\n",
       " 'vulkan': 52339,\n",
       " 'stapes': 52340,\n",
       " 'odysessy': 52341,\n",
       " 'harmon': 12262,\n",
       " 'surfing': 4255,\n",
       " 'halloran': 23497,\n",
       " 'unbelieveably': 49583,\n",
       " \"'offed'\": 52342,\n",
       " 'quadrant': 30610,\n",
       " 'inhabiting': 19513,\n",
       " 'nebbish': 34763,\n",
       " 'forebears': 40956,\n",
       " 'skirmish': 34764,\n",
       " 'ocassionally': 52343,\n",
       " \"'resist\": 52344,\n",
       " 'impactful': 21931,\n",
       " 'spicier': 52345,\n",
       " 'touristy': 40957,\n",
       " \"'football'\": 52346,\n",
       " 'webpage': 40958,\n",
       " 'exurbia': 52348,\n",
       " 'jucier': 52349,\n",
       " 'professors': 14904,\n",
       " 'structuring': 34765,\n",
       " 'jig': 30611,\n",
       " 'overlord': 40959,\n",
       " 'disconnect': 25264,\n",
       " 'sniffle': 82204,\n",
       " 'slimeball': 40960,\n",
       " 'jia': 40961,\n",
       " 'milked': 16831,\n",
       " 'banjoes': 40962,\n",
       " 'jim': 1240,\n",
       " 'workforces': 52351,\n",
       " 'jip': 52352,\n",
       " 'rotweiller': 52353,\n",
       " 'mundaneness': 34766,\n",
       " \"'ninja'\": 52354,\n",
       " \"dead'\": 11043,\n",
       " \"cipriani's\": 40963,\n",
       " 'modestly': 20611,\n",
       " \"professor'\": 52355,\n",
       " 'shacked': 40964,\n",
       " 'bashful': 34767,\n",
       " 'sorter': 23391,\n",
       " 'overpowering': 16123,\n",
       " 'workmanlike': 18524,\n",
       " 'henpecked': 27665,\n",
       " 'sorted': 18525,\n",
       " \"jōb's\": 52357,\n",
       " \"'always\": 52358,\n",
       " \"'baptists\": 34768,\n",
       " 'dreamcatchers': 52359,\n",
       " \"'silence'\": 52360,\n",
       " 'hickory': 21932,\n",
       " 'fun\\x97yet': 52361,\n",
       " 'breakumentary': 52362,\n",
       " 'didn': 15499,\n",
       " 'didi': 52363,\n",
       " 'pealing': 52364,\n",
       " 'dispite': 40965,\n",
       " \"italy's\": 25265,\n",
       " 'instability': 21933,\n",
       " 'quarter': 6542,\n",
       " 'quartet': 12611,\n",
       " 'padmé': 52365,\n",
       " \"'bleedmedry\": 52366,\n",
       " 'pahalniuk': 52367,\n",
       " 'honduras': 52368,\n",
       " 'bursting': 10789,\n",
       " \"pablo's\": 41468,\n",
       " 'irremediably': 52370,\n",
       " 'presages': 40966,\n",
       " 'bowlegged': 57835,\n",
       " 'dalip': 65186,\n",
       " 'entering': 6263,\n",
       " 'newsradio': 76175,\n",
       " 'presaged': 54153,\n",
       " \"giallo's\": 27666,\n",
       " 'bouyant': 40967,\n",
       " 'amerterish': 52371,\n",
       " 'rajni': 18526,\n",
       " 'leeves': 30613,\n",
       " 'macauley': 34770,\n",
       " 'seriously': 615,\n",
       " 'sugercoma': 52372,\n",
       " 'grimstead': 52373,\n",
       " \"'fairy'\": 52374,\n",
       " 'zenda': 30614,\n",
       " \"'twins'\": 52375,\n",
       " 'realisation': 17643,\n",
       " 'highsmith': 27667,\n",
       " 'raunchy': 7820,\n",
       " 'incentives': 40968,\n",
       " 'flatson': 52377,\n",
       " 'snooker': 35100,\n",
       " 'crazies': 16832,\n",
       " 'crazier': 14905,\n",
       " 'grandma': 7097,\n",
       " 'napunsaktha': 52378,\n",
       " 'workmanship': 30615,\n",
       " 'reisner': 52379,\n",
       " \"sanford's\": 61309,\n",
       " '\\x91doña': 52380,\n",
       " 'modest': 6111,\n",
       " \"everything's\": 19156,\n",
       " 'hamer': 40969,\n",
       " \"couldn't'\": 52382,\n",
       " 'quibble': 13004,\n",
       " 'socking': 52383,\n",
       " 'tingler': 21934,\n",
       " 'gutman': 52384,\n",
       " 'lachlan': 40970,\n",
       " 'tableaus': 52385,\n",
       " 'headbanger': 52386,\n",
       " 'spoken': 2850,\n",
       " 'cerebrally': 34771,\n",
       " \"'road\": 23493,\n",
       " 'tableaux': 21935,\n",
       " \"proust's\": 40971,\n",
       " 'periodical': 40972,\n",
       " \"shoveller's\": 52388,\n",
       " 'tamara': 25266,\n",
       " 'affords': 17644,\n",
       " 'concert': 3252,\n",
       " \"yara's\": 87958,\n",
       " 'someome': 52389,\n",
       " 'lingering': 8427,\n",
       " \"abraham's\": 41514,\n",
       " 'beesley': 34772,\n",
       " 'cherbourg': 34773,\n",
       " 'kagan': 28627,\n",
       " 'snatch': 9100,\n",
       " \"miyazaki's\": 9263,\n",
       " 'absorbs': 25267,\n",
       " \"koltai's\": 40973,\n",
       " 'tingled': 64030,\n",
       " 'crossroads': 19514,\n",
       " 'rehab': 16124,\n",
       " 'falworth': 52392,\n",
       " 'sequals': 52393,\n",
       " ...}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_word = {v:k for k,v in word_index.items()}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 530,\n",
       " 973,\n",
       " 1622,\n",
       " 1385,\n",
       " 65,\n",
       " 458,\n",
       " 4468,\n",
       " 66,\n",
       " 3941,\n",
       " 2,\n",
       " 173,\n",
       " 2,\n",
       " 256,\n",
       " 2,\n",
       " 2,\n",
       " 100,\n",
       " 2,\n",
       " 838,\n",
       " 112,\n",
       " 50,\n",
       " 670,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 480,\n",
       " 284,\n",
       " 2,\n",
       " 150,\n",
       " 2,\n",
       " 172,\n",
       " 112,\n",
       " 167,\n",
       " 2,\n",
       " 336,\n",
       " 385,\n",
       " 2,\n",
       " 2,\n",
       " 172,\n",
       " 4536,\n",
       " 1111,\n",
       " 2,\n",
       " 546,\n",
       " 2,\n",
       " 2,\n",
       " 447,\n",
       " 2,\n",
       " 192,\n",
       " 50,\n",
       " 2,\n",
       " 2,\n",
       " 147,\n",
       " 2025,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 1920,\n",
       " 4613,\n",
       " 469,\n",
       " 2,\n",
       " 2,\n",
       " 71,\n",
       " 87,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 530,\n",
       " 2,\n",
       " 76,\n",
       " 2,\n",
       " 2,\n",
       " 1247,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 515,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 626,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 62,\n",
       " 386,\n",
       " 2,\n",
       " 2,\n",
       " 316,\n",
       " 2,\n",
       " 106,\n",
       " 2,\n",
       " 2,\n",
       " 2223,\n",
       " 2,\n",
       " 2,\n",
       " 480,\n",
       " 66,\n",
       " 3785,\n",
       " 2,\n",
       " 2,\n",
       " 130,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 619,\n",
       " 2,\n",
       " 2,\n",
       " 124,\n",
       " 51,\n",
       " 2,\n",
       " 135,\n",
       " 2,\n",
       " 2,\n",
       " 1415,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 215,\n",
       " 2,\n",
       " 77,\n",
       " 52,\n",
       " 2,\n",
       " 2,\n",
       " 407,\n",
       " 2,\n",
       " 82,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 107,\n",
       " 117,\n",
       " 2,\n",
       " 2,\n",
       " 256,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 3766,\n",
       " 2,\n",
       " 723,\n",
       " 2,\n",
       " 71,\n",
       " 2,\n",
       " 530,\n",
       " 476,\n",
       " 2,\n",
       " 400,\n",
       " 317,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 1029,\n",
       " 2,\n",
       " 104,\n",
       " 88,\n",
       " 2,\n",
       " 381,\n",
       " 2,\n",
       " 297,\n",
       " 98,\n",
       " 2,\n",
       " 2071,\n",
       " 56,\n",
       " 2,\n",
       " 141,\n",
       " 2,\n",
       " 194,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 226,\n",
       " 2,\n",
       " 2,\n",
       " 134,\n",
       " 476,\n",
       " 2,\n",
       " 480,\n",
       " 2,\n",
       " 144,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 51,\n",
       " 2,\n",
       " 2,\n",
       " 224,\n",
       " 92,\n",
       " 2,\n",
       " 104,\n",
       " 2,\n",
       " 226,\n",
       " 65,\n",
       " 2,\n",
       " 2,\n",
       " 1334,\n",
       " 88,\n",
       " 2,\n",
       " 2,\n",
       " 283,\n",
       " 2,\n",
       " 2,\n",
       " 4472,\n",
       " 113,\n",
       " 103,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 2,\n",
       " 178,\n",
       " 2]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"UNK UNK UNK UNK UNK brilliant casting location scenery story direction everyone's really suited UNK part UNK played UNK UNK could UNK imagine being there robert UNK UNK UNK amazing actor UNK now UNK same being director UNK father came UNK UNK same scottish island UNK myself UNK UNK loved UNK fact there UNK UNK real connection UNK UNK UNK UNK witty remarks throughout UNK UNK were great UNK UNK UNK brilliant UNK much UNK UNK bought UNK UNK UNK soon UNK UNK UNK released UNK UNK UNK would recommend UNK UNK everyone UNK watch UNK UNK fly UNK UNK amazing really cried UNK UNK end UNK UNK UNK sad UNK UNK know what UNK say UNK UNK cry UNK UNK UNK UNK must UNK been good UNK UNK definitely UNK also UNK UNK UNK two little UNK UNK played UNK UNK UNK norman UNK paul UNK were UNK brilliant children UNK often left UNK UNK UNK UNK list UNK think because UNK stars UNK play them UNK grown up UNK such UNK big UNK UNK UNK whole UNK UNK these children UNK amazing UNK should UNK UNK UNK what UNK UNK done don't UNK think UNK whole story UNK UNK lovely because UNK UNK true UNK UNK someone's life after UNK UNK UNK UNK UNK us UNK\""
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in x_train[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "(all_x_train,_),(all_x_valid,_) = imdb.load_data() "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"START this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all\""
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in all_x_train[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Preprocess data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train = pad_sequences(x_train, maxlen=max_review_length, \n",
    "                        padding=pad_type, truncating=trunc_type, value=0)\n",
    "x_valid = pad_sequences(x_valid, maxlen=max_review_length, \n",
    "                        padding=pad_type, truncating=trunc_type, value=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1415,    2,    2,    2,    2,  215,    2,   77,   52,    2,    2,\n",
       "         407,    2,   82,    2,    2,    2,  107,  117,    2,    2,  256,\n",
       "           2,    2,    2, 3766,    2,  723,    2,   71,    2,  530,  476,\n",
       "           2,  400,  317,    2,    2,    2,    2, 1029,    2,  104,   88,\n",
       "           2,  381,    2,  297,   98,    2, 2071,   56,    2,  141,    2,\n",
       "         194,    2,    2,    2,  226,    2,    2,  134,  476,    2,  480,\n",
       "           2,  144,    2,    2,    2,   51,    2,    2,  224,   92,    2,\n",
       "         104,    2,  226,   65,    2,    2, 1334,   88,    2,    2,  283,\n",
       "           2,    2, 4472,  113,  103,    2,    2,    2,    2,    2,  178,\n",
       "           2],\n",
       "       [ 163,    2, 3215,    2,    2, 1153,    2,  194,  775,    2,    2,\n",
       "           2,  349, 2637,  148,  605,    2,    2,    2,  123,  125,   68,\n",
       "           2,    2,    2,  349,  165, 4362,   98,    2,    2,  228,    2,\n",
       "           2,    2, 1157,    2,  299,  120,    2,  120,  174,    2,  220,\n",
       "         175,  136,   50,    2, 4373,  228,    2,    2,    2,  656,  245,\n",
       "        2350,    2,    2,    2,  131,  152,  491,    2,    2,    2,    2,\n",
       "        1212,    2,    2,    2,  371,   78,    2,  625,   64, 1382,    2,\n",
       "           2,  168,  145,    2,    2, 1690,    2,    2,    2, 1355,    2,\n",
       "           2,    2,   52,  154,  462,    2,   89,   78,  285,    2,  145,\n",
       "          95],\n",
       "       [1301,    2, 1873,    2,   89,   78,    2,   66,    2,    2,  360,\n",
       "           2,    2,   58,  316,  334,    2,    2, 1716,    2,  645,  662,\n",
       "           2,  257,   85, 1200,    2, 1228, 2578,   83,   68, 3912,    2,\n",
       "           2,  165, 1539,  278,    2,   69,    2,  780,    2,  106,    2,\n",
       "           2, 1338,    2,    2,    2,    2,  215,    2,  610,    2,    2,\n",
       "          87,  326,    2, 2300,    2,    2,    2,    2,  272,    2,   57,\n",
       "           2,    2,    2,    2,    2,    2, 2307,   51,    2,  170,    2,\n",
       "         595,  116,  595, 1352,    2,  191,   79,  638,   89,    2,    2,\n",
       "           2,    2,  106,  607,  624,    2,  534,    2,  227,    2,  129,\n",
       "         113],\n",
       "       [   2,    2,    2,  188, 1076, 3222,    2,    2,    2,    2, 2348,\n",
       "         537,    2,   53,  537,    2,   82,    2,    2,    2,    2,    2,\n",
       "         280,    2,  219,    2,    2,  431,  758,  859,    2,  953, 1052,\n",
       "           2,    2,    2,    2,   94,    2,    2,  238,   60,    2,    2,\n",
       "           2,  804,    2,    2,    2,    2,  132,    2,   67,    2,    2,\n",
       "           2,    2,  283,    2,    2,    2,    2,    2,  242,  955,    2,\n",
       "           2,  279,    2,    2,    2, 1685,  195,    2,  238,   60,  796,\n",
       "           2,    2,  671,    2, 2804,    2,    2,  559,  154,  888,    2,\n",
       "         726,   50,    2,    2,    2,    2,  566,    2,  579,    2,   64,\n",
       "        2574],\n",
       "       [   2,    2,  131, 2073,  249,  114,  249,  229,  249,    2,    2,\n",
       "           2,  126,  110,    2,  473,    2,  569,   61,  419,   56,  429,\n",
       "           2, 1513,    2,    2,  534,   95,  474,  570,    2,    2,  124,\n",
       "         138,   88,    2,  421, 1543,   52,  725,    2,   61,  419,    2,\n",
       "           2, 1571,    2, 1543,    2,    2,    2,    2,    2,  296,    2,\n",
       "        3524,    2,    2,  421,  128,   74,  233,  334,  207,  126,  224,\n",
       "           2,  562,  298, 2167, 1272,    2, 2601,    2,  516,  988,    2,\n",
       "           2,   79,  120,    2,  595,    2,  784,    2, 3171,    2,  165,\n",
       "         170,  143,    2,    2,    2,    2,    2,  226,  251,    2,   61,\n",
       "         113],\n",
       "       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,\n",
       "           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,\n",
       "           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,\n",
       "           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,\n",
       "           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,\n",
       "           0,    0,    2,  778,  128,   74,    2,  630,  163,    2,    2,\n",
       "        1766,    2, 1051,    2,    2,   85,  156,    2,    2,  148,  139,\n",
       "         121,  664,  665,    2,    2, 1361,  173,    2,  749,    2,    2,\n",
       "        3804,    2,    2,  226,   65,    2,    2,  127,    2,    2,    2,\n",
       "           2]], dtype=int32)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_train[0:6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100\n",
      "100\n",
      "100\n",
      "100\n",
      "100\n",
      "100\n"
     ]
    }
   ],
   "source": [
    "for x in x_train[0:6]:\n",
    "    print(len(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"cry UNK UNK UNK UNK must UNK been good UNK UNK definitely UNK also UNK UNK UNK two little UNK UNK played UNK UNK UNK norman UNK paul UNK were UNK brilliant children UNK often left UNK UNK UNK UNK list UNK think because UNK stars UNK play them UNK grown up UNK such UNK big UNK UNK UNK whole UNK UNK these children UNK amazing UNK should UNK UNK UNK what UNK UNK done don't UNK think UNK whole story UNK UNK lovely because UNK UNK true UNK UNK someone's life after UNK UNK UNK UNK UNK us UNK\""
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in x_train[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD PAD UNK begins better than UNK ends funny UNK UNK russian UNK crew UNK UNK other actors UNK UNK those scenes where documentary shots UNK UNK spoiler part UNK message UNK UNK contrary UNK UNK whole story UNK UNK does UNK UNK UNK UNK'"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in x_train[5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "#### Design neural network architecture"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "model.add(Embedding(n_unique_words, n_dim, input_length=max_review_length))\n",
    "model.add(Flatten())\n",
    "model.add(Dense(n_dense, activation='relu'))\n",
    "model.add(Dropout(dropout))\n",
    "# model.add(Dense(n_dense, activation='relu'))\n",
    "# model.add(Dropout(dropout))\n",
    "model.add(Dense(1, activation='sigmoid')) # mathematically equivalent to softmax with two classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "embedding_1 (Embedding)      (None, 100, 64)           320000    \n",
      "_________________________________________________________________\n",
      "flatten_1 (Flatten)          (None, 6400)              0         \n",
      "_________________________________________________________________\n",
      "dense_1 (Dense)              (None, 64)                409664    \n",
      "_________________________________________________________________\n",
      "dropout_1 (Dropout)          (None, 64)                0         \n",
      "_________________________________________________________________\n",
      "dense_2 (Dense)              (None, 1)                 65        \n",
      "=================================================================\n",
      "Total params: 729,729\n",
      "Trainable params: 729,729\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model.summary() # so many parameters!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(64, 5000, 320000)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# embedding layer dimensions and parameters: \n",
    "n_dim, n_unique_words, n_dim*n_unique_words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(100, 64, 6400)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ...flatten:\n",
    "max_review_length, n_dim, n_dim*max_review_length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(64, 409664)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ...dense:\n",
    "n_dense, n_dim*max_review_length*n_dense + n_dense # weights + biases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "65"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# ...and output:\n",
    "n_dense + 1 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Configure model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "modelcheckpoint = ModelCheckpoint(filepath=output_dir+\n",
    "                                  \"/weights.{epoch:02d}.hdf5\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.exists(output_dir):\n",
    "    os.makedirs(output_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Train!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 25000 samples, validate on 25000 samples\n",
      "Epoch 1/4\n",
      "25000/25000 [==============================] - 2s 80us/step - loss: 0.5612 - acc: 0.6892 - val_loss: 0.3630 - val_acc: 0.8398\n",
      "Epoch 2/4\n",
      "25000/25000 [==============================] - 2s 69us/step - loss: 0.2851 - acc: 0.8841 - val_loss: 0.3486 - val_acc: 0.8447\n",
      "Epoch 3/4\n",
      "25000/25000 [==============================] - 2s 70us/step - loss: 0.1158 - acc: 0.9646 - val_loss: 0.4252 - val_acc: 0.8337\n",
      "Epoch 4/4\n",
      "25000/25000 [==============================] - 2s 70us/step - loss: 0.0237 - acc: 0.9961 - val_loss: 0.5304 - val_acc: 0.8340\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.History at 0x7f996ad994e0>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.fit(x_train, y_train, \n",
    "          batch_size=batch_size, epochs=epochs, verbose=1, \n",
    "          validation_data=(x_valid, y_valid), \n",
    "          callbacks=[modelcheckpoint])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.load_weights(output_dir+\"/weights.02.hdf5\") # NOT zero-indexed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_hat = model.predict_proba(x_valid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "25000"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(y_hat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.08968422], dtype=float32)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_hat[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_valid[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFTRJREFUeJzt3W2QXuV93/Hvz8iQ2LEtAQtDJVHhseIGe8aY7IBcz6SO5QqBM4gXkMrTBJlRq05K89RMG9y+UAtmBveJlJmYVA1qhCexrNA4aGwaqgoYt52CEYZgA2G0BgJbUbRBQm7KGEfk3xf3JbKIXe1ZafdeL+f7mdm5z/mf65xzXazY330e7vukqpAk9c87FroDkqSFYQBIUk8ZAJLUUwaAJPWUASBJPWUASFJPGQCS1FMGgCT1lAEgST21ZKE7cCJnn312rVq1aqG7Ib3V954evL73gwvbD2kKjzzyyJ9V1chM7X6oA2DVqlXs27dvobshvdV/+8Tg9VMPLGQvpCkl+dMu7TwFJEk9ZQBIUk8ZAJLUUwaAJPWUASBJPWUASFJPGQCS1FMGgCT1lAEgST3V6ZPASX4V+HtAAd8GrgPOA3YCZwLfAn6+qn6Q5AzgTuAngZeBv1NVz7XtfA7YDLwO/FJV3TunoznOqhu+Pp+bn9Zzt3x6QfYrSbMx4xFAkuXALwGjVfVh4DRgI/AF4NaqWg0cZvCHnfZ6uKo+ANza2pHkwrbeh4D1wBeTnDa3w5EkddX1FNAS4EeTLAHeBbwIfBK4qy3fAVzVpje0edrytUnS6jur6rWqehYYAy459SFIkk7GjAFQVf8b+DfA8wz+8B8BHgFeqaqjrdk4sLxNLwdeaOsebe3PmlyfYp03JNmSZF+SfRMTEyczJklSB11OAS1j8O79AuCvAe8GLp+iaR1bZZpl09XfXKjaVlWjVTU6MjLjt5lKkk5Sl1NAnwKeraqJqvoL4A+AvwksbaeEAFYAB9r0OLASoC1/H3Bocn2KdSRJQ9YlAJ4H1iR5VzuXvxZ4ErgfuLq12QTc3aZ3t3na8vuqqlp9Y5IzklwArAa+OTfDkCTN1oy3gVbVQ0nuYnCr51HgUWAb8HVgZ5LPt9odbZU7gC8lGWPwzn9j284TSXYxCI+jwPVV9focj0eS1FGnzwFU1VZg63HlZ5jiLp6q+j5wzTTbuRm4eZZ9lCTNAz8JLEk9ZQBIUk8ZAJLUUwaAJPWUASBJPWUASFJPGQCS1FMGgCT1lAEgST1lAEhSTxkAktRTBoAk9ZQBIEk9ZQBIUk8ZAJLUUwaAJPVUl4fCfzDJY5N+vpfkV5KcmWRPkv3tdVlrnyS3JRlL8niSiydta1Nrvz/Jpun3KkmabzMGQFU9XVUXVdVFwE8CrwJfBW4A9lbVamBvmwe4nMHzflcDW4DbAZKcyeCpYpcyeJLY1mOhIUkavtmeAloLfLeq/hTYAOxo9R3AVW16A3BnDTwILE1yHnAZsKeqDlXVYWAPsP6URyBJOimzDYCNwJfb9LlV9SJAez2n1ZcDL0xaZ7zVpqtLkhZA5wBIcjpwJfD7MzWdolYnqB+/ny1J9iXZNzEx0bV7kqRZms0RwOXAt6rqpTb/Uju1Q3s92OrjwMpJ660ADpyg/iZVta2qRqtqdGRkZBbdkyTNxmwC4DP81ekfgN3AsTt5NgF3T6pf2+4GWgMcaaeI7gXWJVnWLv6uazVJ0gJY0qVRkncBfxv4B5PKtwC7kmwGngeuafV7gCuAMQZ3DF0HUFWHktwEPNza3VhVh055BJKkk9IpAKrqVeCs42ovM7gr6Pi2BVw/zXa2A9tn301J0lzzk8CS1FMGgCT1lAEgST1lAEhST3W6CCxJfbTqhq8v2L6fu+XT874PjwAkqacMAEnqKQNAknrKAJCknjIAJKmnDABJ6ikDQJJ6ygCQpJ4yACSppwwASeopA0CSesoAkKSe6hQASZYmuSvJnyR5KsnHkpyZZE+S/e11WWubJLclGUvyeJKLJ21nU2u/P8mm6fcoSZpvXY8A/j3wR1X1N4CPAE8BNwB7q2o1sLfNA1wOrG4/W4DbAZKcCWwFLgUuAbYeCw1J0vDNGABJ3gv8FHAHQFX9oKpeATYAO1qzHcBVbXoDcGcNPAgsTXIecBmwp6oOVdVhYA+wfk5HI0nqrMsRwPuBCeA/JXk0yW8neTdwblW9CNBez2ntlwMvTFp/vNWmq79Jki1J9iXZNzExMesBSZK66RIAS4CLgdur6qPA/+OvTvdMJVPU6gT1NxeqtlXVaFWNjoyMdOieJOlkdAmAcWC8qh5q83cxCISX2qkd2uvBSe1XTlp/BXDgBHVJ0gKYMQCq6v8ALyT5YCutBZ4EdgPH7uTZBNzdpncD17a7gdYAR9oponuBdUmWtYu/61pNkrQAuj4T+BeB301yOvAMcB2D8NiVZDPwPHBNa3sPcAUwBrza2lJVh5LcBDzc2t1YVYfmZBSSpFnrFABV9RgwOsWitVO0LeD6abazHdg+mw5KkuaHnwSWpJ4yACSppwwASeopA0CSesoAkKSeMgAkqacMAEnqKQNAknrKAJCknjIAJKmnDABJ6ikDQJJ6ygCQpJ4yACSppwwASeopA0CSeqpTACR5Lsm3kzyWZF+rnZlkT5L97XVZqyfJbUnGkjye5OJJ29nU2u9Psmm6/UmS5t9sjgB+uqouqqpjTwa7AdhbVauBvW0e4HJgdfvZAtwOg8AAtgKXApcAW4+FhiRp+E7lFNAGYEeb3gFcNal+Zw08CCxNch5wGbCnqg5V1WFgD7D+FPYvSToFXQOggP+a5JEkW1rt3Kp6EaC9ntPqy4EXJq073mrT1SVJC6DTQ+GBj1fVgSTnAHuS/MkJ2maKWp2g/uaVBwGzBeD888/v2D1J0mx1OgKoqgPt9SDwVQbn8F9qp3Zorwdb83Fg5aTVVwAHTlA/fl/bqmq0qkZHRkZmNxpJUmczBkCSdyd5z7FpYB3wHWA3cOxOnk3A3W16N3BtuxtoDXCknSK6F1iXZFm7+Luu1SRJC6DLKaBzga8mOdb+96rqj5I8DOxKshl4Hrimtb8HuAIYA14FrgOoqkNJbgIebu1urKpDczYSSdKszBgAVfUM8JEp6i8Da6eoF3D9NNvaDmyffTclSXPNTwJLUk8ZAJLUUwaAJPWUASBJPWUASFJPGQCS1FMGgCT1lAEgST1lAEhSTxkAktRTBoAk9ZQBIEk9ZQBIUk8ZAJLUUwaAJPWUASBJPdU5AJKcluTRJF9r8xckeSjJ/iRfSXJ6q5/R5sfa8lWTtvG5Vn86yWVzPRhJUnezOQL4ZeCpSfNfAG6tqtXAYWBzq28GDlfVB4BbWzuSXAhsBD4ErAe+mOS0U+u+JOlkdQqAJCuATwO/3eYDfBK4qzXZAVzVpje0edryta39BmBnVb1WVc8yeGbwJXMxCEnS7HU9AvgN4J8Cf9nmzwJeqaqjbX4cWN6mlwMvALTlR1r7N+pTrCNJGrIZAyDJzwAHq+qRyeUpmtYMy060zuT9bUmyL8m+iYmJmbonSTpJXY4APg5cmeQ5YCeDUz+/ASxNsqS1WQEcaNPjwEqAtvx9wKHJ9SnWeUNVbauq0aoaHRkZmfWAJEndzBgAVfW5qlpRVasYXMS9r6r+LnA/cHVrtgm4u03vbvO05fdVVbX6xnaX0AXAauCbczYSSdKsLJm5ybR+HdiZ5PPAo8AdrX4H8KUkYwze+W8EqKonkuwCngSOAtdX1eunsH9J0imYVQBU1QPAA236Gaa4i6eqvg9cM836NwM3z7aTkqS55yeBJamnDABJ6ikDQJJ6ygCQpJ4yACSppwwASeopA0CSesoAkKSeMgAkqacMAEnqKQNAknrKAJCknjIAJKmnDABJ6ikDQJJ6ygCQpJ4yACSpp2YMgCQ/kuSbSf44yRNJ/mWrX5DkoST7k3wlyemtfkabH2vLV03a1uda/ekkl83XoCRJM+tyBPAa8Mmq+ghwEbA+yRrgC8CtVbUaOAxsbu03A4er6gPAra0dSS5k8HzgDwHrgS8mOW0uByNJ6m7GAKiBP2+z72w/BXwSuKvVdwBXtekNbZ62fG2StPrOqnqtqp4FxpjimcKSpOHodA0gyWlJHgMOAnuA7wKvVNXR1mQcWN6mlwMvALTlR4CzJtenWEeSNGSdAqCqXq+qi4AVDN61/8RUzdprplk2Xf1NkmxJsi/JvomJiS7dkySdhFndBVRVrwAPAGuApUmWtEUrgANtehxYCdCWvw84NLk+xTqT97GtqkaranRkZGQ23ZMkzUKXu4BGkixt0z8KfAp4CrgfuLo12wTc3aZ3t3na8vuqqlp9Y7tL6AJgNfDNuRqIJGl2lszchPOAHe2OnXcAu6rqa0meBHYm+TzwKHBHa38H8KUkYwze+W8EqKonkuwCngSOAtdX1etzOxxJUlczBkBVPQ58dIr6M0xxF09VfR+4Zppt3QzcPPtuSpLmmp8ElqSeMgAkqacMAEnqKQNAknrKAJCknjIAJKmnDABJ6ikDQJJ6ygCQpJ4yACSppwwASeopA0CSesoAkKSeMgAkqacMAEnqKQNAknqqyyMhVya5P8lTSZ5I8sutfmaSPUn2t9dlrZ4ktyUZS/J4kosnbWtTa78/yabp9ilJmn9dHgl5FPi1qvpWkvcAjyTZA3wW2FtVtyS5AbgB+HXgcgbP+10NXArcDlya5ExgKzAKVNvO7qo6PNeDkvT2suqGry90F96WZjwCqKoXq+pbbfr/Mngg/HJgA7CjNdsBXNWmNwB31sCDwNIk5wGXAXuq6lD7o78HWD+no5EkdTarawBJVjF4PvBDwLlV9SIMQgI4pzVbDrwwabXxVpuuLklaAF1OAQGQ5MeA/wz8SlV9L8m0Taeo1Qnqx+9nC7AF4Pzzz+/avR8qC3W4+twtn16Q/UpanDodASR5J4M//r9bVX/Qyi+1Uzu014OtPg6snLT6CuDACepvUlXbqmq0qkZHRkZmMxZJ0ix0uQsowB3AU1X17yYt2g0cu5NnE3D3pPq17W6gNcCRdoroXmBdkmXtjqF1rSZJWgBdTgF9HPh54NtJHmu1fwbcAuxKshl4HrimLbsHuAIYA14FrgOoqkNJbgIebu1urKpDczIKSdKszRgAVfU/mPr8PcDaKdoXcP0029oObJ9NByVJ88NPAktSTxkAktRTBoAk9ZQBIEk9ZQBIUk8ZAJLUUwaAJPWUASBJPWUASFJPGQCS1FMGgCT1lAEgST1lAEhSTxkAktRTnR8JKUkL9bhTzQ+PACSppzwCeBtZyHdnPpBeWny6PBN4e5KDSb4zqXZmkj1J9rfXZa2eJLclGUvyeJKLJ62zqbXfn2TTVPuSJA1Pl1NAvwOsP652A7C3qlYDe9s8wOXA6vazBbgdBoEBbAUuBS4Bth4LDUnSwpgxAKrqG8DxD2/fAOxo0zuAqybV76yBB4GlSc4DLgP2VNWhqjoM7OGtoSJJGqKTvQh8blW9CNBez2n15cALk9qNt9p09bdIsiXJviT7JiYmTrJ7kqSZzPVF4ExRqxPU31qs2gZsAxgdHZ2yjX74LNQFaC8+SyfvZAPgpSTnVdWL7RTPwVYfB1ZOarcCONDqnziu/sBJ7lvqNe/F11w52VNAu4Fjd/JsAu6eVL+23Q20BjjSThHdC6xLsqxd/F3XapKkBTLjEUCSLzN49352knEGd/PcAuxKshl4HrimNb8HuAIYA14FrgOoqkNJbgIebu1urKrjLyxLs7ZQ74Z3vv9lADb6blyL2IwBUFWfmWbR2inaFnD9NNvZDmyfVe8kSfPGr4KQpJ4yACSppwwASeopA0CSesoAkKSeMgAkqacMAEnqKQNAknrKAJCknjIAJKmnDABJ6ikDQJJ6ygCQpJ4yACSppwwASeopA0CSemroAZBkfZKnk4wluWHY+5ckDQw1AJKcBvwmcDlwIfCZJBcOsw+SpIFhHwFcAoxV1TNV9QNgJ7BhyH2QJDH8AFgOvDBpfrzVJElDNuND4edYpqjVmxokW4AtbfbPkzx9Cvs7G/izU1h/senbeGGBxvyxN6Z+Zti7Bn/PvZAvnNKY/3qXRsMOgHFg5aT5FcCByQ2qahuwbS52lmRfVY3OxbYWg76NFxxzXzjm+THsU0APA6uTXJDkdGAjsHvIfZAkMeQjgKo6muQfAfcCpwHbq+qJYfZBkjQw7FNAVNU9wD1D2t2cnEpaRPo2XnDMfeGY50GqauZWkqS3Hb8KQpJ6atEHwExfLZHkjCRfacsfSrJq+L2cWx3G/I+TPJnk8SR7k3S6JeyHWdevEElydZJKsujvGOky5iQ/237XTyT5vWH3ca51+Ld9fpL7kzza/n1fsRD9nCtJtic5mOQ70yxPktvaf4/Hk1w8px2oqkX7w+BC8neB9wOnA38MXHhcm38I/Fab3gh8ZaH7PYQx/zTwrjb9C30Yc2v3HuAbwIPA6EL3ewi/59XAo8CyNn/OQvd7CGPeBvxCm74QeG6h+32KY/4p4GLgO9MsvwL4Lww+Q7UGeGgu97/YjwC6fLXEBmBHm74LWJtkqg+kLRYzjrmq7q+qV9vsgww+b7GYdf0KkZuAfwV8f5idmyddxvz3gd+sqsMAVXVwyH2ca13GXMB72/T7OO5zRItNVX0DOHSCJhuAO2vgQWBpkvPmav+LPQC6fLXEG22q6ihwBDhrKL2bH7P9Oo3NDN5BLGYzjjnJR4GVVfW1YXZsHnX5Pf848ONJ/meSB5OsH1rv5keXMf8L4OeSjDO4m/AXh9O1BTOvX58z9NtA59iMXy3Rsc1i0nk8SX4OGAX+1rz2aP6dcMxJ3gHcCnx2WB0agi6/5yUMTgN9gsFR3n9P8uGqemWe+zZfuoz5M8DvVNW/TfIx4EttzH85/91bEPP692uxHwHM+NUSk9skWcLgsPFEh1w/7LqMmSSfAv45cGVVvTakvs2Xmcb8HuDDwANJnmNwrnT3Ir8Q3PXf9t1V9RdV9SzwNINAWKy6jHkzsAugqv4X8CMMvifo7arT/+8na7EHQJevltgNbGrTVwP3Vbu6skjNOOZ2OuQ/MPjjv9jPC8MMY66qI1V1dlWtqqpVDK57XFlV+xamu3Oiy7/tP2RwwZ8kZzM4JfTMUHs5t7qM+XlgLUCSn2AQABND7eVw7QaubXcDrQGOVNWLc7XxRX0KqKb5aokkNwL7qmo3cAeDw8QxBu/8Ny5cj09dxzH/a+DHgN9v17ufr6orF6zTp6jjmN9WOo75XmBdkieB14F/UlUvL1yvT03HMf8a8B+T/CqDUyGfXcxv6JJ8mcEpvLPbdY2twDsBquq3GFznuAIYA14FrpvT/S/i/3aSpFOw2E8BSZJOkgEgST1lAEhSTxkAktRTBoAk9ZQBIEk9ZQBIUk8ZAJLUU/8fsGlgTkeyC2IAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f996ad99550>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.hist(y_hat)\n",
    "_ = plt.axvline(x=0.5, color='orange')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "pct_auc = roc_auc_score(y_valid, y_hat)*100.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'92.85'"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"{:0.2f}\".format(pct_auc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "float_y_hat = []\n",
    "for y in y_hat:\n",
    "    float_y_hat.append(y[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "ydf = pd.DataFrame(list(zip(float_y_hat, y_valid)), columns=['y_hat', 'y'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y_hat</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.089684</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.982754</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.746905</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.543328</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.997054</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.833994</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.766254</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.008032</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.812743</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.729463</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      y_hat  y\n",
       "0  0.089684  0\n",
       "1  0.982754  1\n",
       "2  0.746905  1\n",
       "3  0.543328  0\n",
       "4  0.997054  1\n",
       "5  0.833994  1\n",
       "6  0.766254  1\n",
       "7  0.008032  0\n",
       "8  0.812743  0\n",
       "9  0.729463  1"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ydf.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"START please give this one a miss br br kristy swanson and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite lacklustre so all you madison fans give this a miss\""
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in all_x_valid[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"START originally supposed to be just a part of a huge epic the year 1905 depicting the revolution of 1905 potemkin is the story of the mutiny of the crew of the potemkin in odessa harbor the film opens with the crew protesting meat and the captain ordering the execution of the an uprising takes place during which the revolutionary leader is killed this crewman is taken to the shore to lie in state when the townspeople gather on a huge flight of steps overlooking the harbor czarist troops appear and march down the steps breaking up the crowd a naval squadron is sent to retake the potemkin but at the moment when the ships come into range their crews allow the to pass through eisenstein's non historically accurate ending is open ended thus indicating that this was the seed of the later bolshevik revolution that would bloom in russia the film is broken into five parts men and maggots drama on the an appeal from the dead the odessa steps and meeting the squadron br br eisenstein was a revolutionary artist but at the genius level not wanting to make a historical drama eisenstein used visual texture to give the film a newsreel look so that the viewer feels he is eavesdropping on a thrilling and politically revolutionary story this technique is used by the battle of algiers br br unlike eisenstein relied on or the casting of non professionals who had striking physical appearances the extraordinary faces of the cast are what one remembers from potemkin this technique is later used by frank capra in mr deeds goes to town and meet john doe but in potemkin no one individual is cast as a hero or heroine the story is told through a series of scenes that are combined in a special effect known as montage the editing and selection of short segments to produce a desired effect on the viewer d w griffith also used the montage but no one mastered it so well as eisenstein br br the artistic filming of the crew sleeping in their is complemented by the graceful swinging of tables suspended from chains in the galley in contrast the confrontation between the crew and their officers is charged with electricity and the clenched fists of the masses demonstrate their rage with injustice br br eisenstein introduced the technique of showing an action and repeating it again but from a slightly different angle to demonstrate intensity the breaking of a plate bearing the words give us this day our daily bread signifies the beginning of the end this technique is used in last year at marienbad also when the ship's surgeon is tossed over the side his nez dangles from the rigging it was these glasses that the officer used to inspect and pass the maggot infested meat this sequence ties the punishment to the corruption of the czarist era br br the most noted sequence in the film and perhaps in all of film history is the odessa steps the broad expanse of the steps are filled with hundreds of extras rapid and dramatic violence is always suggested and not explicit yet the visual images of the deaths of a few will last in the minds of the viewer forever br br the angular shots of marching boots and legs descending the steps are cleverly accentuated with long menacing shadows from a sun at the top of the steps the pace of the sequence is deliberately varied between the marching soldiers and a few civilians who summon up courage to beg them to stop a close up of a woman's face frozen in horror after being struck by a soldier's sword is the direct antecedent of the bank teller in bonnie in clyde and gives a lasting impression of the horror of the czarist regime br br the death of a young mother leads to a baby carriage careening down the steps in a sequence that has been copied by hitchcock in foreign correspondent by terry gilliam in brazil and brian depalma in the untouchables this sequence is shown repeatedly from various angles thus drawing out what probably was only a five second event br br potemkin is a film that the revolutionary spirit celebrates it for those already committed and it for the unconverted it seethes of fire and roars with the senseless injustices of the decadent czarist regime its greatest impact has been on film students who have borrowed and only slightly improved on techniques invented in russia several generations ago\""
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in all_x_valid[6]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y_hat</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>0.927986</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>152</th>\n",
       "      <td>0.927225</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>386</th>\n",
       "      <td>0.970688</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>426</th>\n",
       "      <td>0.906968</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>547</th>\n",
       "      <td>0.944135</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>614</th>\n",
       "      <td>0.916398</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>680</th>\n",
       "      <td>0.931426</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>693</th>\n",
       "      <td>0.970964</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>710</th>\n",
       "      <td>0.928176</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>740</th>\n",
       "      <td>0.988633</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        y_hat  y\n",
       "112  0.927986  0\n",
       "152  0.927225  0\n",
       "386  0.970688  0\n",
       "426  0.906968  0\n",
       "547  0.944135  0\n",
       "614  0.916398  0\n",
       "680  0.931426  0\n",
       "693  0.970964  0\n",
       "710  0.928176  0\n",
       "740  0.988633  0"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ydf[(ydf.y == 0) & (ydf.y_hat > 0.9)].head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"START wow another kevin costner hero movie postman tin cup waterworld bodyguard wyatt earp robin hood even that baseball movie seems like he makes movies specifically to be the center of attention the characters are almost always the same the heroics the flaws the greatness the fall the redemption yup within the 1st 5 minutes of the movie we're all supposed to be in awe of his character and it builds up more and more from there br br and this time the story story is just a collage of different movies you don't need a spoiler you've seen this movie several times though it had different titles you'll know what will happen way before it happens this is like mixing an officer and a gentleman with but both are easily better movies watch to see how this kind of movie should be made and also to see how an good but slightly underrated actor russell plays the hero\""
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in all_x_valid[386]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y_hat</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>0.099175</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>0.072720</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>0.027467</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>0.058242</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>242</th>\n",
       "      <td>0.098803</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248</th>\n",
       "      <td>0.074462</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>300</th>\n",
       "      <td>0.065653</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>325</th>\n",
       "      <td>0.066929</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>333</th>\n",
       "      <td>0.045406</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345</th>\n",
       "      <td>0.043948</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        y_hat  y\n",
       "22   0.099175  1\n",
       "100  0.072720  1\n",
       "101  0.027467  1\n",
       "224  0.058242  1\n",
       "242  0.098803  1\n",
       "248  0.074462  1\n",
       "300  0.065653  1\n",
       "325  0.066929  1\n",
       "333  0.045406  1\n",
       "345  0.043948  1"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ydf[(ydf.y == 1) & (ydf.y_hat < 0.1)].head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"START finally a true horror movie this is the first time in years that i had to cover my eyes i am a horror buff and i recommend this movie but it is quite gory i am not a big wrestling fan but kane really pulled the whole monster thing off i have to admit that i didn't want to see this movie my 17 year old dragged me to it but am very glad i did during and after the movie i was looking over my shoulder i have to agree with others about the whole remake horror movies enough is enough i think that is why this movie is getting some good reviews it is a refreshing change and takes you back to the texas chainsaw first one michael myers and jason and no cgi crap\""
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(index_word[id] for id in all_x_valid[224]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
